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We present general methods for simulating black-box Hamiltonians using quantum walks. These 
techniques have two main applications: simulating sparse Hamiltonians and implementing black-box 
unitary operations. In particular, we give the best known simulation of sparse Hamiltonians with 
constant precision. Our method has complexity linear in both the sparseness D (the maximum 
number of nonzero elements in a column) and the evolution time t, whereas previous methods had 
complexity scaling as D 4 and were superlinear in t. We also consider the task of implementing an 
arbitrary unitary operation given a black-box description of its matrix elements. Whereas standard 
methods for performing an explicitly specified N xN unitary operation use 0(N 2 ) elementary gates, 
we show that a black-box unitary can be performed with bounded error using 0(iV 2/3 (log log JV) 4/3 ) 
queries to its matrix elements. In fact, except for pathological cases, it appears that most unitaries 
can be performed with only 0(vN) queries, which is optimal. 



I. INTRODUCTION 

One of the major applications of quantum computation 
is the simulation of Hamiltonian dynamics. Hamiltonian 
simulation is the basis for simulating quantum systems — 
the original motivation for quantum computers \m — and 
also has applications to quantum algorithms [2|-|5(- 

An explicit procedure for simulating local Hamiltoni- 
ans on a quantum computer was given by Lloyd [&] . This 
result was later substantially generalized to the simula- 
tion of sparse Hamiltonians by Aharonov and Ta-Shma 
■ References [1, Q improved these results by providing 
a simulation scheme with complexity that scales close to 
linearly in the evolution time t and as the fourth power 
of the sparseness parameter D, the maximum number of 
nonzero elements in a column. 

In this paper, we consider the general task of simulat- 
ing Hamiltonians, without necessarily assuming sparsity. 
As particular applications, we provide improved meth- 
ods for simulating sparse Hamiltonians and implement- 
ing unitary transformations. Similar to previous work 
on Hamiltonian simulation we assume throughout 

that the Hamiltonian is specified by an oracle. That is, 
a black-box function computes the matrix element Hjk 
for any desired row index j £E {1,2,..., M} and column 
index k e {1,2, ... ,M}. An algorithm for computing 
the matrix elements of H can be used to construct such 
a black box. In particular, this means that the black-box 
model applies to common physical Hamiltonians such as 
those considered by Lloyd 6]. Such Hamiltonians are 
a sum of local terms, each acting on a limited number 
of subsystems. Thus they are sparse, and there is an 
efficient method for computing the matrix elements of 
the overall Hamiltonian. An advantage of the black-box 
model of Hamiltonian simulation is that it can be ap- 
plied not only to physical Hamiltonians, but as a basis 
for designing algorithms for other problems @, 0, [lj . 

In contrast to previous work on Hamiltonian simu- 
lation 0, most of which is based on Lie- Trotter- 
Suzuki formulae, we use a new approach based on quan- 



tum walks [10]. A limitation of the Lie- Trotter-Suzuki 
approach is that it relies on limiting the error by using 
many short time steps. As a result, the complexity of the 
simulation always scales superlinearly in t. The quantum 
walk approach provides scaling that is strictly linear in t 
[po| . which is known to be optimal Another limita- 
tion of the Lie- Trotter-Suzuki approach is that it relies 
on decomposing the Hamiltonian into 1-sparse matrices, 
which results in poor scaling in the sparseness D. In Ref. 
Q the scaling was 0(D 4 ), which was recently improved 
to (J(D 3 ) [11]Q In contrast, by using quantum walks 
we improve the scaling to linear in D. This represents 
the best known constant-precision simulation of sparse 
Hamiltonians. Furthermore, if we are willing to accept 
superlinear scaling in t, the scaling in D may be improved 
to 0{D 2 / Z ) in general, and in many cases to O(Z) 1 / 2 ). 

The quantum walk approach to Hamiltonian simula- 
tion was proposed in Ref. [l(| , though without providing 
an explicit method to implement the steps of the quan- 
tum walk in the general case. Here we present a complete 
method for Hamiltonian simulation by showing how to 
implement the steps of the quantum walk for a general 
Hamiltonian that may or may not be sparse. We use the 
method of Ref. [l(| together with a range of other tools, 
which we combine and improve on in nontrivial ways to 
obtain our final result. In particular, our approach intro- 
duces the following techniques. 

1. In Sec. IIV1 we modify the method of Ref. [l(j by 
using phase estimation to correct a lazy quantum 
walk, giving a more efficient simulation. 

2. In Sec. [V] we describe how to perform steps of the 
quantum walk using state preparation by ampli- 
tude amplification (similar to a method proposed 
in Ref. |12|). improving the efficiency in the non- 
sparse case. 



We use a tilde to indicate that subpolynomial scaling is ignored — 
that is, / = 6(g) if / = 0(g 1+rl ) for any n > 0. 
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3. We modify the state to be prepared by using an 
ancilla qubit to satisfy an orthogonality condition 
for the lazy quantum walk (compare Eqs. and 
(f2"2")l ). This facilitates more efficient state prepara- 
tion by amplitude amplification in Sec. |V] (outper- 
forming direct application of Ref. [12j), and even 
allows us to prepare the state in only O(l) queries 
in the sparse case described in Sec. IIV1 

4. In Sec. IVII1 we further improve the simulation in 
the non-sparse case by decomposing the Hamilto- 
nian as a sum of terms and recombining these terms 
using Lie- Trotter-Suzuki formulae. The decompo- 
sition depends on the magnitudes of the matrix el- 
ements, giving an approach that is fundamentally 
different from previous applications of Lie- Trotter- 
Suzuki formulae. 

While our results on simulating non-sparse Hamiltoni- 
ans may be of interest in their own right, additional mo- 
tivation for studying this problem comes from the related 
task of implementing general unitary transformations. 
Standard methods for implementing an arbitrary N x N 
unitary transformation on a quantum computer work by 
decomposing it into a product of two-level unitary ma- 
trices pj, Uj and performin g th e two-level unitaries via 
the Solovay-Kitaev theorem (l5l4l7| . This method uses 
N 2 poly(log N) gates. Since counting arguments show 
that Q(N 2 ) elementary gates are required even to ap- 
proximate a general unitary transformation [3, G3, EH , 
such an implementation is nearly optimal. 

Instead of considering an explicit unitary operation, 
we study the problem of performing a unitary transfor- 
mation specified by a black box for its matrix elements, 
similar to the black box for a non-sparse Hamiltonian de- 
scribed above. In such an oracle model, counting argu- 
ments for unitary implementation no longer apply, since 
the black box depends on the unitary. While the question 
of how many queries are required to implement a general 
unitary in this model seems quite natural, to the best of 
our knowledge it has not been studied previously. 

Implementation of unitaries is closely connected to 
Hamiltonian simulation, because one can implement a 
unitary by simulating a related Hamiltonian. Reference 
[n| used this idea to provide a method for implement- 
ing sparse unitaries. However, their approach relies on a 
decomposition into 1-sparse Hamiltonians, so it performs 
poorly in the non-sparse case. 

Using Hamiltonian simulation via quantum walks, we 
show that the complexity of implementing a general (non- 
sparse) NxN unitary scales with TV as 0(N 2 / 3 ). We also 
present numerical evidence that typical unitaries can be 
implemented in only 0(y/~N) queries (although it is pos- 
sible to construct unitaries for which our method uses 
more queries). This is much less than the f2(iV 2 ) elemen- 
tary gates required to implement a general unitary given 
an explicit description instead of a black box. The best 
lower bound we are aware of is Q(yN), because imple- 
mentation of a black-box unitary operation can be used 



to solve a search problem (20j . 

Implementation of black-box unitary transformations 
is closely related to the task of preparing an TV- 
dimensional quantum state given a black box for its am- 
plitudes in a fixed basis. Grover showed that the query 
complexity of this task is Q(y/~N) As mentioned 

above, we build on his technique in order to implement 
black-box unitaries. It is an open question whether black- 
box unitaries can be implemented in 0{^f~N) queries in 
general, or if there is a fundamental separation between 
the query complexity of implementing unitaries and the 
query complexity of preparing states. 

Since implementing unitary transformations is a ba- 
sic task in quantum computation, we expect this result 
to have applications to quantum algorithms. For exam- 
ple, our approach could serve as an alternative to previ- 
ous methods for efficiently implementing general unitary 
transformations on a logarithmic number of qubits, with 
improved performance provided the matrix elements can 
be computed quickly. 

The remainder of this article is organised as follows. In 
Sec. |TT] we give a technical summary of our main contri- 
butions. Then, in Sec. IIII1 we summarise the method of 
Ref. flol | for simulating Hamiltonian evolution. Our main 
result, that a black-box Hamiltonian can be simulated 
with (5(Z? 2 / 3 ) queries to its matrix elements, is proven in 
Sec. lVIIl building on a foundation established in Secs.liVl 
through I VII Some of these intermediate results may be 
of interest in their own right; in particular, in Sec. IIV( 
we present a simple method to perform the steps of the 
quantum walk of Ref. [l(| that is especially suitable for 
sparse Hamiltonians. We explain how Hamiltonian sim- 
ulation can be used to implement black-box unitaries in 
Sec. I Villi In Sec. IIXI we give some examples of the sim- 
ulation as applied to particular unitary operations. We 
conclude in Sec. 1X1 with a summary of the results and a 
discussion of some open problems. 

II. MODEL AND RESULTS 
A. Model 

We formulate the Hamiltonian simulation and unitary 
implementation problems using an oracle model, in which 
a description of the Hamiltonian or unitary is provided 
by a black box. For Hamiltonian simulation, the matrix 
elements of some Hermitian matrix H G C M x are given 
by a black box Oh acting as 

H \j,k)\z) = \j,k)\z®H jk ), (1) 

where j, k G {1, 2, . . . , M}. Similarly, for the problem of 
implementing a unitary U G C NxN , we are given a black 
box Ojj acting as 

Ou\j,k)\z) = \j,k)\z@U jk ), (2) 

where j, k G {1, 2, ... , N}. The error of the simulation or 
implementation must be no greater than S as quantified 
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by the trace distance. In practice, the black box Oh or 
Ojj provides the matrix elements to some finite precision, 
C, using O(log^) qubits. We assume that Q <C <5, so 
the imprecision in this approximation does not affect the 
analysis. 

We can also take advantage of sparsity if it is possi- 
ble to compute the positions of nonzero matrix elements. 
Specifically, suppose there are at most D nonzero ele- 
ments in each row or column. For Hamiltonian simula- 
tion, suppose that in addition to the black box Oh, we 
are given a black box Op acting as 

F \j,k) = \j,f(j,k)) (3) 

for any j E {1,2,..., M} and k € {1, 2, . . . , D}, where 
the function f(j, k) gives the row index of the fcth nonzero 
element in column j (or the row index of any zero element 
when there are fewer than k nonzero elements in column 
j)- 

Note that Of computes the row index in place. In con- 
trast, some previous work on simulating sparse Hamilto- 
nians [t| [ll[ assumes that a single query only computes 
/(j, k) given j and k, i.e., performs the isometry 0' F act- 
ing as 0' F \j, k) — \j,k, f(j,k)). The oracle Of can be 
used to produce such an oracle in one query, simply by 
copying Ho a third register before calling Op- There- 
fore, all the upper bounds for the complexity in 0, 
hold for the oracle Op. Furthermore, the algorithms in 
[1, El do not depend on this aspect of the oracle, so 
changing the oracle in those algorithms would not lead 
to improved upper bounds. Thus our results may be di- 
rectly compared with those of [1, [ll[ . 

To construct the oracle Of, it suffices to first compute 
f(j,k) for a given j, and then compute k given j and 
f(j, k) in order to erase the register encoding k. In con- 
trast, O'p does not uncompute k. For realistic cases such 
as the local Hamiltonians considered by Lloyd @ , deter- 
mining k given j and f(j, k) is not difficult, so Op is a 
realistic representation of the resources used. 

If desired, one can quantify the resources used by our 
simulations in terms of queries to the black box 0' F in- 
stead of to Of. Even if k cannot be computed directly, 
one can implement Of with 0' F using additional queries 
to uncompute k. If the function / provides the nonzero 
elements in sorted order, one can find k by binary search, 
increasing the number of queries by a factor of only log D. 
In general, one can use Grover's algorithm to find k, in- 
creasing the number of queries by a factor of 0(-\/15). 

Another model used by some papers on sparse Hamil- 
tonian simulation is that for any given j, a single query 
reveals all the nonzero entries in the jth column, i.e., 
the values f(j, 1), . . . , f(j, D) @, 0, Q. With that model, 
the black box Of can be implemented using two queries, 
whereas D calls to Of are required to compute all the 
values f(j, 1), . . . , f(j, D). As Refs. @,0)H| are primarily 
concerned with showing polynomial scaling in D, such a 
difference is unimportant. 

We emphasise that in the present work, we do not 
require D — poly (log M); our methods apply for any 



D < M. If the Hamiltonian is not sparse, or if it is sparse 
but the nonzero elements are in unknown positions, then 
we can simply take D — M and let Of be the identity 
operation. Thus, all the results of the paper hold for 
non-sparse cases, with D — M . In particular, when con- 
sidering the problem of unitary implementation, we do 
not assume sparsity, so the black box Of is not required. 

We assume that information about the Hamiltonian 
or unitary can only be obtained by querying the oracle, 
so in particular we do not know the norms of H or U 
(except for the trivial fact that \\U\\ = 1). However, we 
assume that we do have upper bounds on various norms: 
we are given constants A,Ai,A max satisfying A > \\H\\, 
Ai > H-Hlli, and A max > ||-ff|| max , where \\H\\ denotes 
the spectral norm of H, \\H\\x max.,- ^fef=i and 
II -ff II max := max^fc \H jk \. 



B. Results 

Our first main result, proved in Sec. IIV1 is that a 
Hamiltonian can be simulated with scaling linear in both 
||Jf||f and D. 

Theorem 1. For a given Hamiltonian H, let A > \\H\\ 
and A max > ||-ff|| max . Then the evolution under H for 
time t can be simulated with error at most 6 € (0, 1] using 

O (Jj= + DA max t + lj (4) 

queries to Oh and Of. 

This result is suitable for simulation of sparse Hamilto- 
nians. The next main result, proved in Sec. I VII CI gives 
improved scaling in D at the expense of worse scaling 
in ||-ff||t. This result may be preferable for non-sparse 
Hamiltonians. 

Theorem 2. Let A > The evolution under the 

Hamiltonian H for time t can be simulated with error at 
most 8 G (0, 1] using 

O [D 2 ^ [(log log D) Ai] 4 / 3 ^ 1 / 3 ) (5) 

queries to Oh and Of, provided SD > At > \/S. 

Using the correspondence between Hamiltonian simu- 
lation and unitary implementation described in Sec. lVIlIl 
this easily implies our main result on the implementation 
of black-box unitaries (see Sec. I Villi for the proof). 

Corollary 3. A black-box unitary operation U can be 
implemented with error at most 8 G (0,1] using 

O^^QoglogA) 4 /^- 1 / 3 ). (6) 

queries to Ojj , provided SN > ir/2. 
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Although the above results are the best we are able to 
show for general non-sparse Hamiltonians and unitaries, 
we believe that our methods are typically more efficient. 
Theorem [5] and Corollary [3] are based on a decomposi- 
tion of the Hamiltonian into a sum of terms, where the 
nonzero matrix elements of each term have comparable 
size. In the worst-case analysis of Sec. I VII C[ we must 
take into account the possibility that in this decomposi- 
tion, the spectral norms of the individual terms could be 
much larger than the spectral norm of the total Hamil- 
tonian. Numerically, we find that this does not occur 
when selecting matrices at random, only when matri- 
ces are specifically designed to cause this behaviour (see 
Sec. IVIIB|I . Assuming that all terms in the decomposi- 
tion have comparable norms, we show in Sec. IVIEAl that 
the number of queries to simulate a Hamiltonian with 
II-H1I < A is 



O 



((A^y^logD) 7 / 4 ); (7) 



correspondingly, a black-box unitary can be implemented 
using 



O 



(V^(log^) 7/4 



(8) 



queries. 



III. REVIEW OF HAMILTONIAN SIMULATION 

In this section we summarise an approach to Hamilto- 
nian simulation based on discrete-time quantum walks 
[Icj . Throughout, M denotes the dimension of the 
Hilbert space that a black-box Hamiltonian acts on, and 
N the dimension of the space that a black-box unitary 
transformation acts on. To construct a discrete-time 
quantum walk from a given Hamiltonian H, the Hilbert 
space is expanded from C M to C M+1 <g> C M+1 . A step of 
the discrete-time quantum walk is described by a unitary 
operator 



V := iS(2TT 1 - 1) 



(9) 



Here the operator S swaps the two registers, i.e., S\j, k) — 
\k,j) for all j, fee {1,2,..., M + 1}. The operator T is 
the isometry 



M 

i=i 

mapping \j) to \r]j) := \j)\(pj), where 

M 



(10) 



\<Pj) : = 
with 



\H\\ 



k=l 



Ev^l fc > + \/ 1 -^jrl M + 1 



M 

E 

k=l 



H 



jk 



(11) 



(12) 



(cf. Eq. (27) of Ref. pjpj). Here e E (0, 1] is a parameter 
that can be made small to obtain a lazy quantum walk, 

and \\H\\i := maxj Y,Hi\ H jk\- 

The eigenvalues and eigenvectors of V are closely re- 
lated to those of H [2l[ . If we define H to be the operator 
with matrix elements 



Hjk ■= (Vj\ s \Vk), 
then with \cpj) given by Eq. (fTT]) . 

H ^ 



Hi 



(13) 



(14) 



The operators H and H have common eigenstates |A). 
The corresponding eigenvalues for H and H are denoted 
A and A, and are related by 



A = 



eA 

W\ 



(15) 



Each eigenstate |A) corresponds to two eigenvectors of V, 



lA4> := 



1- e 



±2 arcsin A 



s 



2(1 -A 2 ) 



T\X), 



with eigenvalues 



A . I Aii arcsin A 



(16) 



(17) 



Reference [10( describes simulations of H based on the 
quantum walk V. One approach is to use phase estima- 
tion to (coherently) determine the value of A. Introducing 
a phase of exp(— iXt) = cxp(— iAt||i?|| i/e) for each eigen- 
vector | A) simulates evolution under H for time t. More 
specifically, a general initial state has the form 



W) = E^, fc |A,fc), 

\,k 



(18) 



where the index k accounts for degenerate eigenvalues of 
H. Applying the operation T yields 



A,fc 

V'A.fc 



E 

A,fc J2(l-A 2 ) 



[(1 - Xe- 1 
+ (1- Ae 4£ 



—i arccos 



i Am ,.A 



(19) 



where the k) are eigenvectors of V, with the index k 
labeling an orthonormal basis for each eigenspace. Ap- 
plying the correct phase factor for each eigenspace gives 



Te- iHt \1>) =E^ At V>A,fcT|A,fc). 



(20) 



\,k 



Applying then gives e lHt \ip), the desired time- 
evolved state. 
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Phase estimation can provide an estimate of ji± with 
variance approximately (it/d) 2 using d applications of V. 
The variance in A is then 0(\\H \\f /e 2 d 2 ), which translates 
to an error in the state of 0(\\H\\it/ed). If the allowed 
error is 5, then the simulation can be achieved with d = 
0(\\H\\it/S) by taking e = 1. (Note that the S used here 
is the square root of that used in Ref. [1(3], because that 
paper considered a lower bound on the fidelity of 1 — 6, 
whereas we take S to be an upper bound on the trace 
distance.) 



IV. SPARSE HAMILTONIAN SIMULATION 

The simulation scheme presented in Ref. [lol ] quanti- 
fies the complexity in terms of the number of quantum 
walk steps. To simulate a black-box Hamiltonian, we re- 
quire a method to perform these steps. In this section 
we describe a simple approach to this problem, thereby 
providing a Hamiltonian simulation method suitable for 
the sparse case. 

The walk step is composed of two operators, the swap 
S and a reflection 2TT^ — 1. The operation S is easy to 
implement; the difficulty lies in implementing the reflec- 
tion. It is given explicitly by 



A I 



2TT+-I=^|i)01®(2|^>(^|-1) 



(21) 



That is, it is a reflection about conditional on the 
state \ j) in the first register. To perform this reflection, 
it suffices to give a procedure for preparing \ipj) from the 
|0) state: by performing inverse state preparation, reflect- 
ing about |0), and then performing state preparation, we 
effectively reflect about \<fij). 

Black-box preparation of an M-dimensional quantum 
state can clearly be performed in M queries, but this 
would introduce an overall multiplicative factor of M 
in the complexity of the implementation. The over- 
all query complexity of the simulation would then be 
0(M \\H\\it/5). Taking advantage of sparsity reduces this 
to 0(D\\H\\it/S), but Theorem [TJ uses even fewer queries. 

Our improved simulation uses the following insight. 
We modify the state \<pj) to 



\4>j) ■■= 




\k)\o) 



k=l 



1 - 



-10)11) 



,( 22 ) 
That is, we 

with |0) 1 1). 
state in the 



where |£j) is some superposition of the \k). 
append an ancilla qubit, and replace \M + 1) 
The second term of Eq. (|22|). flagged by a |1) 
ancilla qubit, takes the place of the \M + 1) state in Eq. 
(TIT]) . Thus the discrete-time quantum walk takes place 
in C 2M Cg) C 2M , although it is effectively confined to a 
subspace of dimension (M + l) 2 . 

To take account of the fact that ||-ff||i may not be 
known exactly, we replace e with e = eAi/||ff||i, where 



Ai is a known upper bound on ||-ff||i. Then we can al- 
ternatively express the definition of \<j)j) as 



- M . 

-Ev^'^ + v 1 ^ 1 ^ 1 ^ (23) 

k— 1 ' 



1^-) - 



Note that the restriction e < 1 implies that e < Ai / 1| H || i . 
The relation between the eigenvalues A and A can be 
expressed in terms of e as 



A = AAi/e. 



(24) 



Provided e is sufficiently small, we can prepare this 
state using a constant number of queries. In particular: 

Lemma 4. The state \4>j) in Eq. (|23[) can be prepared 
in 0(1) calls to the oracles Oh and Op provided e £ 
(0, A-x/DA max ], where A max > \\H\\ msx and A 1 > 

Proof. First, prepare an equal superposition over |1) to 
\D) in the first register and initialise the ancilla qubit to 
|0), giving 



(25) 



k=l 



Querying the black box Of changes this to 

1 



I^TttE i fc )i°)' ( 26 ) 



where Fa is the set of indices given by Op on input j. 
Next we transform to 



1 



Ei*> 

keFj 



IT* 

X 



10) 



1 - 



X 



1) 



(27) 



where X = Ki/eD. The most important requirement on 
X is that X > A max , so X > \\H\\ max and the amplitude 
for the |0) state has magnitude at most 1. Because of 
the requirement that e < Ai/DA max , taking X = A\/eD 
ensures that X > A max . In addition, for this value of X , 
Eq. (f27|) has the form of \cf>j) for some choice of 

The state \4>j) can be transformed to Eq. (|27p by com- 
puting Hjk in an ancilla register with the black box Oh 
and using this value to perform a controlled rotation on 
the qubit. Applying Oh again uncomputes the ancilla 
storing Hjk- Note that the rotation can be performed 
with error at most S using poly(log j) operations [l5l[T6j. 
but this factor is not included in the analysis since we fo- 
cus on the query complexity. □ 

Next, we improve the Hamiltonian simulation method 
reviewed in Sec. IIIII by combining a lazy quantum walk 
with phase estimation. 

Lemma 5. Let Ait/e 6 Z and e € (0,1]. Evolution 
under H for time t can be simulated using 0(Ait/s) steps 
of the quantum walk defined by the states \<j)j) from Eq. 
with error 0(A 2 e 2 /A 2 ). 
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Proof. Given an arbitrary input state, first we (coher- 
ently) determine the sign ± of ^4 (recall Eq. 

The phase of fi^_ is arcsinA, whereas the phase of fj,_ 
is 7r — arcsinA. Performing phase estimation with one bit 
of precision on V gives probabilities of measuring + or 
— , given that the eigenvalue is ^\ or fi_ , of 



Using 



Pr(+|±) 



Pr(-|±) = 



i± Vi-x 2 



1 =p y/l - A 2 



(28) 
(29) 



The probability of error is therefore 0(A 2 ). Since A = 
eA/Ai and A < A, the error due to misidentification of 
the sign is 0(A 2 e 2 /A 2 ). 

Having estimated the sign, we can apply a lazy quan- 
tum walk more accurately than in Ref. If the sign 
of fj,± is — , we apply V d , where d — Ait/e is an integer 
(chosen so that Xt = Ac?), giving a phase factor of 



•i arc si n 



x ) d = {-l) d e- lXt + 0(dX i ). (30) 



(The sign can be corrected if d is odd.) Similarly, if the 
sign of [i± is +, we apply (V ') d , giving a phase factor 



l X\d 



\y.t 



-iXt 



0(dX 3 ). 



(31) 



In either case, the error in the final state is 

0(d~X 3 ) = 0(X 3 t(e/A 1 ) 2 ) < 0(A 3 t(e/A!) 2 ), (32) 

considerably less than that for the method given in Ref. 
[Tol Theorem 2] for small e. 

We further reduce the error by using an estimate of A 
to correct the lazy quantum walk. After the lazy quan- 
tum walk implements the phase factor e -" !arcsm ^ i the 
estimate of A is used to correct the difference between 
A and arcsinA. With d applications of V, we obtain an 
estimate of arcsinA with standard deviation 0(1/ d) (see 
for example [l(| Theorem 5]). Since A— arcsinA = 0(A 3 ), 
this estimate only improves the accuracy if 1/d is small 
compared to A. If Ad < 1, we do not perform a correction. 
In that case the error is 0(dX 3 ) < 0(X 2 ). On the other 
hand, if Ac? > 1, then the standard deviation in the esti- 
mate of A — arcsin A is 0(A 2 /d), and again the error in the 
final phase is (3(A 2 ). So in both cases, the error in the fi- 
nal state is 0(A 2 ) < 0(A 2 e 2 /A 2 ). The overall number of 
steps of the quantum walk used is 0(d) = 0(A\t/e). □ 

Combining the results of Lemmas 0] and [5] gives the 
improved Hamiltonian simulation method described by 
Theorem [TJ 

Proof of Theorem [IJ We apply the Hamiltonian simula- 
tion described in Lemma [SJ with the method described 
in Lemma U to perform the steps of the quantum walk. 



X = max { \At/Vd] , \A max Dt\ } , (33) 



we take e = A\/DX. This value of X satisfies X > 
A m ax (as we always require for X), which implies that 
the condition e < A\/ DA max of Lemma [4] is satisfied. 
Thus, by Lemma HI the steps of the quantum walk can 
be performed using O(l) queries. 

With this value of e, Ait/e = XDt is an integer, and 
we can use Lemma [SJ Then the error is 

0(A 2 e 2 /A 2 ) = 0(A 2 /D 2 X 2 ) < 0(6). (34) 

The total number of oracle calls is 0(A\t/e) = O(XDt), 
and is therefore 

0(\At/x/5^ + \A max Dt}) = 0(At/x/5+A max Dt+l) (35) 

as claimed. □ 

Comparing this to the simulation of sparse Hamiltoni- 
ans using high-order integrators the scaling is better 
in terms of all parameters except 5. (We assume that the 
upper bounds on norms of H have the same order as the 
norms themselves, so for example A = 0(||JJ||).) The 
number of queries is only linear in ||i?||f, as opposed to 
slightly superlinear. The scaling is particularly improved 
in terms of D, as it is only linear, whereas the scaling in 
Ref. Q was as D 4 . The scaling in 6 is as as op- 

posed to an arbitrarily small power in Ref. However, 
there is an advantage in that for 8 = 0(1/ D 2 ) there is 
no further explicit dependence on D. 

This also improves over the method of Ref. [l(| , which 
uses 0(||.ff||ii/5) steps of the quantum walk. Using a 
naive method for state preparation — simply querying all 
nonzero elements — uses 0(D\\H\\it/8) queries. Theorem 
Q] improves on this as ||i?||i is typically larger than both 
1 1 #11 max and || if ||, and because the scaling with i5 is im- 
proved. In particular, the bound || if ||i < D \ \ if || max 
shows that D\\H\\ x t/5 < 0(D 2 \\H\\ niax t/S), which is 
worse than D||if|| max i, and the bound ||ff||i < \AD||ff|| 
shows that D\\H\\xt/5 < 0(D 3 / 2 \\H\\t/S), which is worse 
than \\H\\t/V8. 

We conclude this section by describing in more detail 
how the isometry T is used in each part of the simulation. 
It is used in three different ways: 

1. At the beginning, to map the initial state from C M 
into the tensor product space C 2M ® C 2M . 

2. To implement each application of V. 

3. At the end, to map the final state from C 2M ® C 2M 
back to C M . 

For the first step, we can directly implement T as de- 
fined in Eq. (fTU|) . The initial state is a superposition 
of states \j) for j E {1,2,...,M} used to control the 
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state preparation. We simply introduce another register 
in which \<j>j) is prepared. 

When T is used to implement V, we need to specify the 
action on the extra qubit introduced in the state prepa- 
ration procedure. We only require the correct eigenvalue 
and eigenvector relations, namely, that 



T*ST\\) = A|A). 



(36) 



On the expanded space, the isometry T should have the 
form 



A I 



[li, 0)1^)0% o| + li, 1)1^)0*. i|] (37) 

2=1 

for some states |f2j). Because |A) is orthogonal to \j, 1) 
(as the initial state has the qubit initialised as |0)), 



A I 



T|A) = ^|i,0)|^)0-,0|A), 

2=1 



(38) 



giving 



M 



T^ST\X) = [O'»O|^>(0jl*.O>|i,O)(*,O|A) 
j,fe=i 

+ <i,l|0fc><n 3 -|fe s O)|i ) l)<fc,O|A)]. (39) 



To obtain Eq. (|36[) , we simply need the second term above 
to vanish. This can be ensured by taking = |fl, 1) 
for any state |f2). Note that it is important to properly 
apply the operator V to states where the ancilla qubit is 
in the state |1), since although the individual T\X) have 
the ancilla in the |0) state, the eigenstates \fi±) of V have 
a component of ST\X), and therefore have a component 
with the ancilla in the |1) state. 

For the final use of T, we wish to map the state 
| j, 0) \4>j) to \j, 0) for each j € {1, 2, . . . , M}. In general, 
this can only be carried out approximately, because the 
final state will not be exactly a superposition of states of 
the form \j, 0)\(f>j), First, if the ancilla qubit is in the state 
|1), this may be regarded as a failure, because the ideal 
final state has the ancilla in the state |0). Otherwise, we 
perform inverse state preparation conditional on the in- 
dex j. Starting from \j, 0)\<f)j), the second register should 
ideally be mapped to the initial state used for the state 
preparation; any other state can be regarded as failure. 
In general, the total failure probability is proportional to 
the error in the inverse state preparation procedure. 



V. IMPROVED STATE PREPARATION 

To further improve our simulations, especially in the 
non-sparse case, we consider state preparation techniques 
based on amplitude amplification |22h24| . 

Our techniques draw from work on black-box state 
preparation. In that problem, we are given an oracle 
Oip acting as 



0^\j)\z) = \j}\z®i>j} 



(40) 



for some quantum state — Y^jLi tne g° a l 1S to 

prepare a copy of Grover showed how to prepare a 
black-box quantum state with only O(VM) queries [Hj]. 
By the lower bound for search [2(J , preparation of an Tri- 
dimensional black-box quantum state requires f2(-\/M) 
queries, so this state preparation scheme is optimal. 

We can use Grover's technique to prepare the states 
from Eq. (|23p and thereby implement the quantum walk. 
This is favorable for large D, in which case state prepa- 
ration based on Lemma [4] alone is suboptimal. By com- 
bining the approach of Lemma |4] with amplitude amplifi- 
cation, we improve on both these approaches, as follows. 



Lemma 6. Let Ai 

(0,1]. Then 



> WHh, A„ 



> \\H\\ 



and e € 



AI 



fc=l 



can be approximately prepared using 



Ai 



(42) 



queries to Oh and Of- The approximation has relative 
error in the weighting of the first term of 0(e). 

Proof. As in the proof of Lemma |4l we can prepare 



m ■= 



i 



keF 3 



H 



f 10) 



1 - 



X 



1) 



(43) 



using one query to Of and two queries to Oh, where X 
is a real number satisfying X > A max > ||i?|| ma x- Let 
Bj denote a unitary operation that prepares \<fx) from 
|0)|0). 

We now use a form of amplitude amplification similar 
to that introduced by Grover [l2j]. We define two reflec- 
tion operators. The first reflects about the |0) state for 
the ancilla qubit, 



R f :=1® (1-2|0)(0|), 
and the second reflects about the state I 



(44) 



(45) 



The latter reflection can be performed by applying Bj, 
reflecting about |0)|0), and then applying Bj. Using an 
appropriate number of these reflections, we could obtain 
a final state close to 



(46) 




-^>^™*>|o>- 



However, the key point is that we do not rotate all the 
way towards this state, but instead prepare 
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where Afj is a normalisation constant. This expression 



corresponds to the definition of \cj)j 

10) « " 



with 




H 



jk\ 



X, 



\k). 



(48) 



According to Eq. (|13[) . the normalised Hamiltonian cor- 
responding to the discrete-time quantum walk defined by 
these states is 



H = 



eH 

aT' 



(49) 



We prepare a state close to \<j>j) using amplitude am- 
plification. Let 



denote the state as a function of the number of steps, r. 
We have 



|fc(r)>=Bin[(2r + l)0j 



+ A^-cos[(2r + l)^] V 1 

k£Fj 



\H 



jk\ 



X 



where 



sm I 



— iaJ\a)>\ — 



\k)\l), (51) 



(52) 



By Eqs. (06} and g^, 
of r that gives the desired outcome is 



3 ^i^Ji \ DA" 

• f \4>j) = y/eaJjAi, so the value 




(53) 



There are several reasons why we cannot perform ex- 
actly r° pt queries. This value may not be an integer, and 
it is j-dependent. Furthermore, since Oj is not known in 
general, the exact value of r° pt is unknown. However, if 
e is small, then the arcsin function can be linearised, and 
we can take 



1 eXD 1 

2 V ~! 2' 



Specifically, we choose 



1 eA max D 
2V Ar 



and 



X = (2r + l) 



2 Ai_ 



(54) 



(55) 



(56) 



Since the number of queries per step is O(l), the total 
number of queries is 0(-\/£A max D / Ai + 1) as claimed. 

Now we analyse the error incurred due to imperfect 
state preparation. First consider the deviation of r from 
r° pt . This deviation results from linearisation of both the 
numerator and denominator of Eq. (|53[) . The argument 
of the arcsin function in the denominator is smaller than 
that in the numerator, so to determine the scaling of the 
error, it suffices to consider the error in the linearisation 
of the numerator. The relative error is thus 



opt 



0(saj/Ax) < 0(e) 



(57) 



since <7j < for all j. 



The effect of the difference between r and r° pt is a 

A 



(50) slightly incorrect weighting of \<f>j) in the final state: 



(^|0 i (r))=sin[(2r 




(58) 



where 



Al sin[(2r + 1)6 j] - 1 




= -i{sin[(2r° pt + 1)0,-] 

+2^cos[(2rf t + l)0 J ](r-r° pt )} 




'-f2^cos[(2rf + l)e j ]( 



(59) 



for some rj nt G [r, r° pt ], where in the second line we have 
used Taylor's theorem. Hence 




2r + 1 
0(e). 



(60) 



In the next to last line, we have used Eq. (|56l) . Hence 
the error in the weighting of the first term in Eq. (|41|) is 
0(e), as claimed. □ 

The state preparation scheme described in Lemma [6] 
introduces additional error in the Hamiltonian simula- 
tion, but this error is well bounded. In particular, we 
have the following. 

Lemma 7. The error in the state preparation scheme of 
Lemma results in an error in the Hamiltonian simula- 
tion described in Sec. 1/771 of Q(||JJ||te). 
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Proof. The actual Hamiltonian being simulated, H', has 
matrix elements 

H' jk = (j,0\(Mr)\S\k,0)\Mr)) 
= (^(r)|fc,O)(j,O|0 fc (r)) 



sin[(2r + 1)6 j] sin[(2r + l)6 k ] 



eH 



Ai 



(l + Xj^l + Xk). 



(61) 



Defining a diagonal matrix x := diag(x!, x 2 , . . . , xm), the 
error in the Hamiltonian is 



\H'-H\\ = 



— (xH + Hx + xHx) 
Ai 



^ 4 H L 2 



Ai 
Ar 



0(e), 



(62) 



where x max := max,, \xj\. In evolving the Hamiltonian 
over time t, we multiply this by a factor of tAi/e, so the 
resulting error is 0(||iZ"||te). □ 



VI. NON-SPARSE HAMILTONIANS 

Now we examine the overall performance of the Hamil- 
tonian simulation algorithm with improved state prepa- 
ration. Multiplying the number of steps of the quantum 
walk by the number of queries required to implement 
each step, we find the following. 

Lemma 8. Given a black-box Hamiltonian H, let A > 



H ,Ai> H 



H\ 



Then H can be 



simulated for time t with error at most 8 € (0, 1] using 



O t 3 / 2 



A max -DAiA 



queries to Oh and Of, provided that 
At > VS, 

A 2 



At > , 

A max AiL> 

A < Ai. 



, and 



(63) 

(64) 
(65) 
(66) 



The restriction (|66|) simply means that A is not unnec- 
essarily large. Because || 1 1 < ||i?||i, we can decrease any 
given A to be at most Ai, provided (|64|) still holds. This 
Lemma provides improved performance in cases where 
D is large. This may mean that D = AI, but we con- 
tinue to perform the analysis in terms of the sparseness 
parameter D for generality. 



Proof. We take 



A^ 



(67) 



This ensures that e < 8/ At, so ||-ff||te < S. The restric- 
tion (f64|) then ensures that e < 1. In addition, (f64|) and 
([66)) ensure that AiAt 2 /i5 > 1, so the ceiling function 
does not affect the scaling, and 



l/e = 0(At/S). 



(68) 



With this value of e, Ait/e is an integer, and therefore 
Lemma [5] shows that 0(Ait/e) quantum walk steps suf- 
fice for the simulation. Then, using Lemma El the num- 
ber of oracle queries for each step of the quantum walk is 
0(yj eA max _D / Ai) , unless this quantity is less than 1, in 
which case the state preparation proceeds without am- 
plitude amplification. 

That case does not alter the result, because the total 
number of queries for the simulation as given by Eq. (|4]) 
in Theorem[T]is less than Eq. (I63|) given the restrictions in 
L emma El Thi s can be shown as follows. First, assuming 
A/eA max D/Ai = 0(1), we have 



VSDA msiK t 



= O 



< O 




(69) 



In the second line we have used Eq. (|68|) , and in the third 



line we have used the condition that \/sA r , 
O(l). Next, 



*D/Ai 



At 

71 



<t 3/2 



Amax-DAiA 



(70) 



using the restriction (|6"5"|) . Finally, combining Eqs. 
and (|65|) shows that the number of queries in Eq. (f63|) is 
at least constant. Thus we find that Eq. (|4]) is less than 
Eq. (|6"3")l . as required. 

For the case where state preparation proceeds via am- 
plitude amplification, we multiply the number of steps 
of the quantum walk (from Lemma [S]) by the number of 
oracle calls for each step (from LemmalH]). Thus the total 
number of queries is 




< O t 3/2 



Amax-DAiA 



(71) 



where we have used Eq. 

Finally, we consider the error in the simulation. Be- 
cause £ < 6/ At, Lemma [7] implies that the error due to 
imperfect state preparation is 0(5). Using Lemma [SJ the 
error due to the quantum walk simulation is 0(A 2 e 2 /A 2 ). 
Using e < 8/ At and y/S < At, this contribution to the 
error is also 0(8). 

The statement of the Lemma requires that the error 
is less than 8, rather than 0(8). However, any multiply- 
ing factor for the error can be absorbed into the big-O 
notation of Eq. ([55]). □ 
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We are interested in improving the scaling with D be- 
yond the linear scaling in Theorem [T] The number of 
queries in Lemma [8] contains \[D, but also depends on 
several other quantities. For simplicity, in this discus- 
sion we assume that A can be replaced with \\H\\, and 
so forth. In the worst case we can have ||-ff|| ma x °c 1 1 -HI I 
and || ||i oc ||i?||vZ). This would yield overall scaling 
of 0((\\H\\t)^ 2 D^ 4 /VS). However, it should be noted 
that this worst case arises from two different factors. 



To have ||-ff|| ma x oc ||J?||, the distribution of the 
magnitudes of the matrix elements should have a 
sharp peak, so there is a row with most of the 
weight on one of the elements. 



1 



2. To have \\H\\i oc ||U||V-D, the magnitudes of the 
matrix elements should be relatively evenly dis- 
tributed. 

If we could ensure that all the nonzero elements had mag- 
nitudes within some constant factor (so there is no sharp 



peak), then we would obtain \\H\ 
a scaling of 0((\\H\\t) 3 / 2 ^/IT/S). 



oc 



#11 /v A giving 



VII. BREAKING UP THE HAMILTONIAN 

We now consider how the simulation can be improved 
by breaking up the Hamiltonian into a sum of terms. 
Although the matrix elements of the Hamiltonian may 
differ over a wide range, the Hamiltonian can be bro- 
ken up into terms, each of which has matrix elements 
of similar magnitude. By combining the evolution un- 
der these Hamiltonians via a Lie- Trotter-Suzuki formula, 
we can expect scaling close to 0{{\\H\\t) 3 / 2 \J D / 5). The 
only problem is that the spectral norms of the individual 
Hamiltonians may be large. First we present a deriva- 
tion showing that, provided the norms of the individual 
terms are not large, then the expected scaling is obtained. 
Next we present numerical results showing that typical 
spectral norms are small, although there are pathologi- 
cal cases with large norms. Finally, we present a general 
method using a number of queries roughly proportional 
to I? 2 / 3 even when the spectral norms are large. 



A. Small norms 

In order to present our result, we define the function 
"break" , which quantifies how much the norm can be 
increased by breaking up the Hamiltonian into parts. Let 



break(iT) := max \\H ab \\/\\H 



where the matrix H is defined by 



rjab 



H 



ita<\H jk \<b, 

if \H jk \ < a or b < \H jk \ 



(72) 



(73) 



In this subsection we suppose that break(TJ) is small. We 
present numerical evidence in Sec. lVII Bl that break(if) < 
1.5 in most cases. From the definition, it is clear that 
break(iJ) > 1. In addition, because \\H ab \\ < \\H ab \\ 1 < 



\\H\\i < \\H\\VD, we have break(ff) < \/D. If break(-ff) 
can be upper bounded by a constant, we obtain a simu- 
lation with scaling close to \AD- 

Theorem 9. Let A > ||F|| and T e [brcak(H), \/D] . 
The evolution under the Hamiltonian H for time t can 
be simulated with error at most 5 € (0, 1] using 



0[y/TDjd{\ogD) 7/4 {At) 3/2 



(74) 



queries 



to Oh and Of, provided 5D > At > \fS. 



Proof. We split the Hamiltonian into L terms, each with 
nonzero elements of approximately the same magnitude: 



(75) 



e=i 



We take the Hamiltonians Hi to include elements with 
decreasing magnitudes: Hi contains elements with the 
largest magnitudes, Hi contains elements with the next 
largest magnitudes, and so forth. We denote the cutoff 
values A t , so Hi = H AlA ^ for £ < L and H L = H 0Al ~^ . 
We take A = A, A L = A/\/D, and A > A x > ■ ■ ■ > A L . 
In examining H e , let AW denote an upper bound on 



H-H^H, Af^ an upper bound on Amax an upper 

bound on ||-fff|| max , and ti the time interval for simula- 
tion of H(. We also let 5# denote the error allowed for 
simulating Hi over a time step of length ti. 



(0 



Because |[ifjjfc| < A^_i, we can take A 
To choose a value of A^ for £ < L, we use 



Ae-L 



M 



\H t \\i < mzxJ2\[H e ] jk \ 2 /Ai 



k=l 
M 



< max V \H jk \ 2 /A e 



k=l 
2 , 



< \\H\VIA 



(76) 



Therefore we can take A\ 
we have 



A 2 /A e for£<L. For £ = L, 



\H f 



< \\H\h < WhwVd. 



(77) 



Since we set A L = A/y/D, we have A^ = A 2 /Ai for 
£ = L as well. This is why we define a value for Al, even 
though it is not used to bound matrix elements. 

The success of the simulation depends crucially on 
the scaling of the norms ||^||- By assumption, T > 
break(iJ), so ||i^|| < T||F||. Because ||i^|| < ||i^||i, we 
can take AW = min{TA, A 2 /A e }. 
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Using Lemma [U the number of queries to simulate Hg 
for time Tg is 



3/, , ^ ^ | < ( A 3/ 2 ./i^A.l, 



(78) 

In this proof we take and <5f to be independent of £. 
To ensure that the number of queries is independent of I 
(so no one term dominates the scaling), we take constant 
ratios At-i/Ag. To satisfy A = A and = A/y/~D, 
we can take A e = AD~ l l %h . Then A^L = Ag„ x = 

AD (.l-£)/2L ) A W = A 2/ A( = AD i/2L^ and the ratiQ be _ 

tween successive cutoffs is Ai-x/Ai = D X / 2L . 

Next we ensure that the conditions of Lemma [5] 
hold. Condition (l66|) follows immediately from AW = 
min{TA, A 2 /A f }. To satisfy conditions (jMl) and (JHSJ), U 
cannot be too small, but it must be small enough that 
the Trotter error is O(S). To achieve this, we choose Tg 
to satisfy 



Tg > max 



L 3/2 A 2£' A£)l + l/2i 



(79) 



In addition, to apply the Trotter formula, t/rg must be 
an even integer. Thus we take 



t 



n 



J" L 3 / 2 AH 2 AD 1 +i/2L t ~) 
1111 \ 15 ' 2T J 



(80) 



For this expression to be well-defined, the denominator 
must be nonzero. For the first term of the minimum, we 
find that 

L 3/2 A 2 t 2 L 3/2 

The first inequality uses the condition At > ^/S and the 
second uses L > 2 (since otherwise we are not breaking 
up the Hamiltonian at all). For the second term to be at 
least 1, we require 



ADt > 2TD~ 1/2L . 



(82) 



If this does not hold, then we perform the simulation 
with Theorem [TJ instead of Lemma|Sl Since A > ||iJ|| ma x, 
we can take A max — A. The condition 3D > At > ^/~5 
implies that Eq. (g]) is O(DAt). Provided Eq. (JH^J is 
violated, we find that we can simulate the Hamiltonian 
with 0(D 1 / 2L ) queries. We will take L oc logD, so the 
simulation uses 0(1) queries, which is no more than Eq. 
([74"]). Thus, for the remainder of this proof, we assume 
that Eq. (Jill) holds, so Eq. ([80]) is well-defined. 

Using Eq. ([80]), we find that Eq. ([79]) is satisfied, and 



A ( % > Ar e > A^F t 



L 3 / 2 AH 



UlH' 



(83) 



The first inequality uses A^ > A and the second uses 
Eq. (Hi). Taking 



(84) 



we obtain A^Tg > y/Se, so Eq. ([64]) is satisfied. Equation 
([65]) follows from 

A^r e > A^T/AD 1+1 ' 2L > {A^f/A^A^D. (85) 

Here the first inequality holds due to the second term of 
the maximum in Eq. (1791) . 

Now we use a Kth order Lie- Trotter-Suzuki integrator 
to combine the simulations of the Hg into a simulation 
of H. The Strang splitting formula [26[ corresponds to 
K = 1; larger values of K correspond to higher-order Lie- 
Trotter-Suzuki formulae. In this proof we simply take 
K = 1; in Section [VII CI we will consider the case K = 2. 
The simulation resulting from a Kth order integrator is 
approximate, introducing error [9j, [2JJ, [28( 



O 



2L5 



K-l 



t max 



1 2K+1 



(86) 



Here r is the time interval over which the integrator is 
repeated. That is, the time is broken up into t/r inter- 
vals, and the same integrator is used on each of those 
intervals. By Eq. (A2) of Ref. @, t, > t x (3/2)3" K . 
Thus, for a fixed value of K , r = 0(jg). Also, the num- 
ber of queries is increased by a factor of 5 K L due to the 
number of terms in the integrator. 

To bound the error, we need to take account of the 
error due to the individual simulations and the error due 
to the Trotter formula. There are 0{Lt/r) terms in the 
Trotter formula for K — 1, so the total error in perform- 
ing the individual simulations (neglecting only the error 
introduced by the Trotter formula) is 0(SgLt/T). Be- 
cause we take Sg = 5T£/L 3 ^ 2 t, the total error due to the 
simulations is 0(5 / L 1 / 2 ), which is 0{5). 

With K — 1 and L cx log£>, the Trotter error from 
Eq. dM]) is 0(L 3 A 3 r 2 t). If S/L 3 / 2 At > T/D 1 - 1 / 21 -, so 
that Eq. (|80[) gives Tg = 0(S/L 3 / 2 A 2 t), then this Trotter 
error is 0(S 2 /At), which is 0(5) because At > S. Alter- 
natively, if 5/L 3 / 2 At < T/D 1+1 / 2i , then the Trotter er- 
ror is 0(AtL 3 T 2 /D 2+1 / L ), which is 0{8L 3 T 2 /D 1+1 / L ). 
With L oc log D and T < VT>, the Trotter error is 0(5). 

The total number of queries is given by ([78]) multiplied 
by Lt/r, so we obtain a simulation using 



O 



(L 7 / 4 (At) 3 / 2 ^ 1 /2+i/(4L) v ^ 7 j^ (87) 



queries. Now taking L cx logD, D 1 ^ — 0(1), so the 
number of queries is as given in Eq. (|74[) . The condition 
8D > At > \J~8~ ensures that this is at least 1. □ 

This theorem holds regardless of whether the norms are 
small. In the worst case we can have T = \f~D, in which 
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FIG. 1. The function break(_H r ) for random Hamiltonians. 
The plusses and squares are the maximum and mean values, 
respectively, obtained for 100 randomly generated Hermitian 
matrices. The crosses and circles are the maximum and mean 
values, respectively, for sets of 100 Hamiltonians composed of 
random unitaries. 

case the D 3 / 4 scaling is again obtained (as at the end of 
Sec. IVI[) . On the other hand, if breaking up the Hamil- 
tonian does not significantly increase the norm, then we 
obtain \pD scaling. 

B. Norms of components 

Now we present numerical results suggesting that for 
typical matrices, the spectral norms of the components 
are small, and break(H) can be upper bounded by a 
constant. If we consider general Hamiltonians, then the 
norms of the components are almost always smaller than 
the norm of the original Hamiltonian. (In this subsec- 
tion, we use "norm" to mean the spectral norm.) We 
tested general Hamiltonians by generating random Her- 
mitian matrices with normally distributed elements. In 
no case was break(ff) more than 1.2, as shown in Fig. [T] 
For large dimension, break(i?) approached 1. 

We expect larger norms for the components when 
breaking up a Hamiltonian derived from a unitary matrix 
as discussed in Sec. I Villi below (see Eq. (|121|l ). This is 
because unitaries can have a large difference between the 
spectral norm and the 1-norm, and the spectral norms 
of the individual components are bounded by the 1-norm 
of H . To test this class of Hamiltonians, we generated 
random unitaries according to the Haar measure. The 
values of break(-ff) were larger than for random Hamil- 
tonians, but were still no larger than 1.5, as also shown 
in Fig. CQ 

One way to generate matrices that do have compo- 
nents with large norms is to perturb the quantum Fourier 
transform. We considered increasing the magnitude of 
the elements with positive real part by 0.01% and de- 
creasing the rest by 0.01%. The value of break(77) (with 



FIG. 2. The function break(_H r ) for a Hamiltonian composed 
of a matrix that has been produced by perturbing a quantum 
Fourier transform. The solid line is y~M for comparison. 

H constructed from this matrix as in Eq. (|121[1 ) was then 
proportional to yM (see Fig. [2]). 

Although this example yields large norms for one split- 
ting, the norms of the components are well-behaved with 
respect to other splittings. For example, a different 
threshold could be used, or we could introduce a smooth 
transition between the two components (i.e., for values in 
some transition region, part of the matrix element could 
go to one component and part to the other). However, 
we suspect that for any particular splitting, one can find 
examples that result in large norms for that splitting. 



C. Large norms 

In this subsection we establish the improved simulation 
described in Theorem [21 without relying on the assump- 
tion that the spectral norms of components remain small. 
We begin with an intuitive description of the method be- 
fore giving the proof. 

We again break the Hamiltonian into components Hi 
according to the magnitudes of the matrix elements. In 
this case, the best available upper bound on the spectral 
norms of the components is A 2 /Ai. Using LemmaJSJ the 
number of queries to simulate each component involves 
a ratio Ai-\/A 2 , in contrast to the corresponding ratio 
Ai-i/Ai when the norms are assumed to be small (com- 
pare Eqs. ([751) and ([57]) ). As a result, the cutoff values 
should be chosen to make Ag-i/A 2 constant, rather than 
to make Ae-i/A? constant. 

In addition, the simulation of Hl should not be per- 
formed via Lemma [51 because that would result in an 
overall scaling no better than that provided by Lemma 
[5J Instead, we use Theorem [1] to simulate Hl. By com- 
paring the number of queries required to simulate Hl 
and Hl—i, this means that (ignoring scaling in quanti- 
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ties other than D) we should have 



IDA 



L-2 



A 2 

A L-1 



DA 



L-l- 



Here the expression on the left comes from using Lemma 
|S] to simulate H L ~i, and the expression on the right 
comes from using Theorem [T] for Hl- This expression 
means that Al-2 ~ DA L _ 1 . The restriction Al-2 > 
Al-i then means that A L -i > D- 1 / 3 . Therefore the 
number of queries is minimised for Al- X ~ Al-2 ~ 
D- 1 / 3 . Then Ag^/Aj w D 1 ' 3 , and the number of 
queries is roughly D 2 / 3 . 

To ensure that Aq is independent of D, we must mod- 
ify the above choices slightly. We choose a small constant 
£, and take A L _i oc D^ 1 / 3 and A e ^/A 2 cx L>V3+2<; it- 
erating gives Al-2 oc D 4 ^ 1 / 3 , Al_ 3 cx D 10 ^ 1 ^ 3 , and so 
forth. The sequence needs to give A independent of D, 
but because the coefficient of £ increases exponentially, 
L need vary only logarithmically in £. 

This approach results in a number of queries to sim- 
ulate each Hg proportional to D 2 ^ 3+ ^ . By choosing 
£ cx 1/logD, D* is O(l). In addition, L varies doubly 
logarithmically in D, which gives a double- logarithmic 
factor in the overall scaling in Theorem [5] In the proof 
below, scaling in all quantities is considered, so it is con- 
venient to define a quantity H that includes D together 
with the other quantities we have omitted here. The scal- 
ing is then given in terms of H, rather than explicitly in 
terms of D. 

In order to show the result rigorously, we need to care- 
fully choose the time intervals, because these are lower 
bounded by the conditions (|64l) and (p5|) of Lemma[51 and 
upper bounded by the need to ensure that the error in the 
Trotter formula is sufficiently small. This is challenging, 
because the bounds for the different components differ 
significantly. The bounds on the Trotter error for Hg de- 
crease with I, so the lower bound on the time interval for 
Hi is greater than the upper bound on the time interval 
for Hl- Thus it is not possible to combine these elements 
in the same Trotter formula while adequately bounding 
the error. To overcome this problem, we use nested Trot- 
ter formulae. We use a higher-order Lie- Trotter-Suzuki 
formula for H2 through Hl , in order to obtain sufficiently 
small error despite the large upper bound on the norm 
of Hl- Then we use the Strang splitting to combine this 
product formula with Hi . 



Proof of Theorem [H The Hamiltonian H is again broken 
into L pieces as in Eq. ([75]l. again with H e = H^ 1 - 1 
for £ < L and Hl = H QAl - 1 . For £ < L, the norms are 
upper bounded as 



\H t \\ < \\H e \\i < \\H\\ 2 /A ( 



(89) 



The spectral norm of Hl can be bounded more strongly: 



\H L \\ = 



H-Y^Hg 



i=i 

< \\H\\ + WH^/Al-l 



(90) 



Therefore, we can take A< £ ) = A 2 /A e for £ < L and 
A( L ) = A + A 2 /A L -i. We can also take A^ = A 2 /A ( 
for £ < L, but for £ = L the best available bound gives 
A^ L) = AVI). We have A^L = A^ x . 
For k > 1, let 



.4 



where 



L-k 

N : 



= A/H 



l/3-(3x2 fe - 1 -2)£ 



SD 
LA? 



6(3 x 2 L ~ 2 - 1)' 



(91) 

(92) 
(93) 



With this choice, A — A and A L -i = A/N 1/3 ~£; unlike 
in Section IVII A[ the ratio between successive cutoffs is 
not constant. We break H into 



log 2 



2 fSD 
9 l0g {M 



(94) 



pieces. 

Note that if SD/At < e 3 , then L = 1, and we do not 
break up the Hamiltonian; we simply simulate H using 
Theorem [TJ Recall that by assumption, SD > At > y/S. 
Therefore, DAt > At/V8 > 1. Since A > ||iJ|| ma x, this 
shows that Theorem [T] uses O(DAt) queries. Assuming 
SD/At<e 3 , we have 



DAt < DAt 



e 3 At 



1/3 



eD 



2/3 



DS 

(At) 4 / 3 



£1/3 

0{D 2 ' 3 [(log log D)At] i / 3 5- 1 / 3 ) 



(95) 



This establishes Theorem [2] when 8D/At < e 3 . In the 
remainder of the proof, we assume that SD/At > e 3 , so 
L > 1. It can also be shown that this implies H > 1. 

For £ < L, Lemma [8] lets us simulate the Hamiltonian 
Hg for time Tg using 



O 



3 / 2 < 



a^Lda^aw 



(96) 



queries. Conditions (|64|) and (1651) of Lemma [8] are satis- 
fied provided Tg is sufficiently small; we verify this below 
when choosing Tg in the analysis of the Trotter error. The 
condition (|66l) is trivial for £ < L. We set 5g — Srg/Lt to 
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ensure that the contribution to the error from the simu- 
lations is 0(5). Thus for i < L, the number of queries 
used to simulate He for time n is 



O A Tj. 



lLDAt-it 



6A 2 



(97) 



A simple calculation shows that 

A 4 - 1 /A^ = ^ 3+ai /A. (98) 
Thus the query complexity of simulating He for t < L is 



(d 2 / 3 At^ ( 



LAt 



1/3N 



(99) 



To simulate i?£ for time tl , we apply Theorem [TJ at 
a cost of 



O 



A 2 y/LFFE 



a l ^Vs 

queries. By a simple calculation, 



DA l _ x t l + 1 



DA L ^r L =D^Ar L ^ ^ , 



(100) 



(101) 



so the query complexity of simulating H^ is also given by 
(IM|) provided the second term of Eq. (|100l) is dominant. 
We verify this after choosing tl below. 

Now we analyze the Trotter error. We use a two-step 
process to combine the terms of H . First we use a Trotter 
formula for the two components Hi and ^2i =2 He- Then 
we combine the terms of X^=2 ^ using another Trotter 
formula. We do this because large time steps are needed 
for Hi, but its norm is small, whereas the time steps 
for the remaining He can be smaller, but the norms are 
larger. 

To combine Hi and X^=2 ^i, the minimum time step 



is set by the restrictions (A^'Tt > ^/6e) and (|65|) 
(n > AW/A^LA^D) for Hi. For general £, using the 
choice Se — Srg/Lt, we see that these restrictions are 
satisfied provided 



Te > max ■ 



AW 



L(AW)H' A W x A?Dj 
For t < L, a simple calculation shows that 
A W i 



(102) 



(5 



LA 2 < 

5 



N -2/3-(3x2 i - £ -2)e 
X-6(2 L - e -l)i 



L{AW)H 

where in the third line we have used 

AW _ A 
A ~ A t 



(103) 



(104) 



Therefore, since H > 1, the first term of Eq. (|102[) is 
larger than the second, and it suffices to take 



n > 



SA 2 



L{A( f )) 2 t LAH' 



(105) 



Since K > 1 implies Ag < Ai-\, this lower bound de- 
creases with increasing I. Thus it suffices to ensure that 
Ti is sufficiently large. Here we use the K — 1 integrator, 
so the ratio t/ri must be an even integer. We can achieve 
this, and ensure t\ > SA 2 / LAH, by taking 



t 



Tl 



t 2 LA 4 
ISA? 



fj 



(106) 



This expression is finite because 
t 2 LA 4 



25Aj 



(At) 2 LH 1 /3+2? £N l/3+« 
; > > 1. 

26 2 



(107) 



The equality uses Eq. (|98|) to compute Ai in terms of 
Aq = A, the first inequality uses the assumption At > 
VS, and the last inequality uses H > 1 and L > 2. 

The norms of the two components in the Trotter for- 
mula are bounded as 



l-ffill < \\H\\ 2 /Ai<A 2 /Ai, 



1=2 



< \\H\ 



\H\\ 2 /Ai < 2A 2 /A U 



(108) 
(109) 



where the second line uses Ai < Aq = A. Thus, by Eq. 
(|5o| . the Trotter error for combining Hi and J2e=2^ e 
with a K — 1 integrator is 



O 



t 2 AH 
A\ 



< 0(6), 



(110) 



where in the last step we have used 6 < At, which follows 
from >/6 < At and 6 < 1. 

Next we combine the He with I > 1, giving a simu- 
lation for time t\. We assume that L > 3 so there are 
at least two such terms to combine; then 6D/At > e 12 . 
By Eq. (|105j) for £ = 2, the conditions of Lemma [8] are 
satisfied if we use time intervals of at least 6A\j 'LAH. 
However, we must choose time intervals that are com- 
patible with the form of the integrator. In this case, we 
use the K = 2 integrator (see Section fVII A[) . which in- 
volves using two different intervals denoted and ■ 
We use the same pair of intervals for all £ > 2. 

The integrator requires intervals of the form 



.(i) 

2 

.(2) 



P2Tl/2v 

: (4p 2 - \)t x /2v 



(111) 

(112) 
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for some positive integer v, where p 2 '■= 1/(4 — 4 1 / 3 ). 
Since y»2 < 4P2 — lj tJ > T -^\ so to satisfy the conditions 
of Lemma [SJ it suffices to ensure that r 2 (1) > 8A 2 /LAH. 
We enforce this by choosing 



P2T1 



LAH 



2SA 2 



This is a positive integer because 

P2 r 1 LA 4 t > p 2 Aj K 1 ^ 



2SA 2 



2A\ 



2(4-4V3) 



> 1. 



(113) 



(114) 



The first inequality uses 7*1 > 5A\/LAH. The final in- 
equality holds since SD/At > e 12 , as discussed above. 

With this choice in hand, we can now verify that 
the second term of Eq. (I10Q|) is dominant. Henceforth 
we omit the superscripts on the time intervals, as they 
only differ by a multiplicative constant. Since A2 — 



A/N 



1/4+34/2 



TL 



= T 2 = 



e(N : 



>/DA), and the sec- 



ond term of Eq. ([TOO]) is 

dal-itl = e^L-iK^-^/A) = e^ 1 / 6 -^). 

In comparison, the first term is 



(115) 



e ah 1 / 3 ^- 



A 2 ^Lt¥£ 

A L ^VS \ VAX 



A 



(116) 



which is smaller than (|115[) since K > 1. We claim that 
the third term of Eq. (|100l) can also be neglected. To see 
this, first note that the choice of L in Eq. (|M)l ensures 
that £ < 1/logH, so < e. By Eq. (fTT5|) . this implies 
that DAl^tl = 9(H 1 / 6 - 2 «) = 0,(1). It follows that Eq. 
(|99P also gives an upper bound on the number of queries 
needed to simulate Hl for time tl. 

Now we analyze the error in the Trotter formula for 
^2g =2 Hg. The norm of the Hg for £ > 2 is largest for 
£ = L, in which case we have the bound 

\\H L \\<\\H\\ + \\H\\ 2 /A L ^ 

= O (ah 1 / 3 ^) . (117) 

By Eq. (|86p . the error in the K = 2 integrator is 



O 



( 


' Lt 2 A 2 ' 






[ A L -i _ 






LS 4 A 8 2 ^ 5 / 3 -^ 
A^H 3 

(At)3Hl/3+17« 



" s ' 


8/3 L 4/3\ 


M. 


D 1 / 3 J 



< 0(6). 



(118) 



In the last line we have used the assumption S < At and 
Eq. ([M]) for L, which shows that L = O(loglogD). 



So far we have only considered the number of queries to 
simulate the individual Hg. For the complete simulation, 
there is an additional factor of L to take account of the 
integrators. Therefore, the total number of queries is 



O 



D 2 / 3 (LAt)^ 3 ^ 
5 1 / 3 



(119) 



As discussed above, < e, so this factor can be ignored. 
Overall, we find that 

/L» 2 / 3 [(loglog£>)At] 4 / 3 \ , x 

{ V/3 J (120) 

queries suffice for the simulation, as claimed. □ 



VIII. IMPLEMENTATION OF UNITARIES 

Next we explain how to implement a unitary transfor- 
mation using the results for simulation of Hamiltonians. 
A simple way to implement a unitary transformation U, 
as proposed by Jordan and Wocjan [l9| (and indepen- 
dently observed by one of us), is to simulate the Hamil- 
tonian 



H 



U 

W 



(121) 



The Hilbert space consists of a qubit tensored with the 
target space. Since H 2 = 1, we have 



-iHt 



cos(i)l — ism(t)H, 



and applying this Hamiltonian for time t 
the evolution 

e-^/a|i)|^) = _ i | )£/|V;) l 



(122) 
ir/2 yields 

(123) 



which is sufficient to implement U. 

Properties of the unitary U and its associated Hamil- 
tonian in Eq. (|12ip are closely related. The dimension 
of the Hamiltonian, M , is simply twice the dimension of 
the unitary, N. In addition, we have 



pT|| = ||tf||=l, 
||IT||i=max{||tf||i, 

|-ff||max = ||f Umax- 



l^lll}, 



(124) 
(125) 
(126) 



We assume that the matrix elements of U are given by 
an oracle Ou as in Eq. @. This oracle can trivially be 
used to construct an oracle On for the Hamiltonian as in 
Eq. U) . Each call to Oh uses one call to Ojj , so black-box 
Hamiltonian simulation results can be applied directly to 
black-box unitary implementation. However, for unitary 
implementation we can take advantage of the fact that 
the eigenvalues of the Hamiltonian arc restricted. 

Lemma 10. Suppose H has eigenvalues ±1 and 
7r/[2 arcsin(e/Ai)] is an odd integer, for e £ (0, 1]. Using 
a guantum walk with states \<pj) as in Eq. (|23|) . evolu- 
tion for time 7r/2 can be simulated exactly using 0(A\/e) 
queries. 
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Proof. Since the eigenvalues of H are A = ±1, the rela- 
tionship between arcsinA and A is simple: taking 



d = 



2 arcsin(e/Ai) ' 



(127) 



the eigenvalues of V d are +i for A = 1 and —i for A = —1. 
These eigenvalues are equivalent (up to the minus sign) 
to evolution under the Hamiltonian H for time tt/2. □ 

This result can be used to exactly implement unitary 
operators via a quantum walk. The scaling is as follows: 

Theorem 11. Given a black-box unitary U , let A max > 
H^ll max- Then U can be implemented exactly with 
(AT A max ) queries to Ojj ■ 

Since we are primarily concerned with implementation 
of general unitaries, which are not sparse, we express 
unitary implementation results in terms of the dimension 
N rather than the sparseness parameter D. 

Proof. This implementation proceeds by simulating the 
Hamiltonian given in Eq. (| 12 1[) for time 7r/2 using Lemma 
I10[ with the steps of the quantum walk implemented us- 
ing Lemma 2J The Hamiltonian has no more than N 
nonzero elements in any row of column, so we can take 
D = N. 

Take e = A 1 /NX, where 



X = 



1 



d = 2 



Nsin[ir/(2d)Y 
n 



4arcsin[l/(AmaxA0] 



(128) 
(129) 



It is easily shown that X > A max , so e < A\/DA max < 
1. In addition, tt/[2 arcsin(e/Ai)] is an odd integer, so 
the conditions of Lemma [10] are satisfied. Then, using 
Lemma QUI the Hamiltonian can be simulated for time 
7r/2 using 0(Ai/e) = O(NX) steps of the quantum walk. 

Because s < Ai/DA max , we can use Lemma 2] and 
each step of the quantum walk can be implemented us- 
ing 0(1) queries. Thus the total number of queries is 



O(NX). Because A„ 



> 



> l/VN, X 



2A max , and the number of queries is 0(AfA max ). 



< 
□ 



Our other results on Hamiltonian simulation can also 
be used to implement unitaries, although in these cases 
there are other sources of error, so the simulation can 
no longer be performed exactly. In each case we take 
t = 7r/2, \\H\\ = 1, and D = N. Lemma [5] yields the 
following corollary for unitary implementation. 

Corollary 12. Given a black-box unitary U, let A\ > 



1 1711 1 and A,, 



>\\U\\ 



Then U can be implemented 



with error at most S £ (0,1] using 



O (^\J A max AAi/(5 



(130) 



queries to Ojj ■ 



Proof. We apply Lemma [8] together with the norm 
bounds in Eqs. dUg) to (|T2l)|) . We use t = tt/2 and 
\\H\\ = 1 to obtain Eq. (jl30]) . We omit conditions (j6l)) 
to (156")) of Lemma [5] as they are automatically satisfied. 
First, the condition (l64l) holds because S < 1. Second, 
(p5l) holds because A max > l/\/N and Ai > \\U\\i > 1. 
Third, flU holds because A x > 1 = A. □ 

Similarly, Theorem [9] yields the following. 



Corollary 13. Let T <E [break(C7), VA] . The unitary 
operation U can be implemented with error at most 5 £ 
(0, 1] using 



O (y/TN/S{\ogN) 7/4 



(131) 



queries to Oh and Of, provided SN > tt/2. 



Proof. We use Theorem M with A = = 1, t = tt/2, 
and D — N. Then the number of queries is as in 
Eq. (fT5T|) . Since \\H\\t = 7r/2, the condition SD > At > 
in Theorem [9] becomes 5N > tt/2 > y/5, and since 
5 < 1, the latter inequality is trivial. □ 

Finally, Theorem [5] yields Corollary [31 which can be 
proven as follows. 

Proof of Corollary^ For unitaries, A = \\H\\ — 1, t — 
tt/2, and D — N, so Eq. ([5]) becomes 



of^aogiogiv) 4 /^- 1 / 3 



(132) 



queries. The condition 5D > At > yj~8 becomes SN > 
tt/2 for the same reason as in the proof of Corollary [T3l 
above. □ 

In the worst case, Corollary 1121 yields query complex- 
ity of 0(N 3 / 4 /VS). This is because A max could be as 
large as 1 and Ai could be as large as y/N. On the 
other hand, if the nonzero matrix elements are of simi- 
lar magnitude, then A max oc 1/Ai, so the scaling will be 
0(y/N/6). Alternatively, if it is possible to break the 
unitary into components without otaining large spectral 
norms, then break(CZ) = 0(1), and Corollary [TBI yields 
scaling of 0(\J N / 8). Those results are not sufficient to 
prove this scaling for all unitaries, because break(C/) may 
be large. However, in general we can use Corollary [3] to 
implement any unitary with 0(A^ 2 / 3 (5 -1 / 3 ) queries. 



IX. EXAMPLES 

We now consider some simple examples of unitaries 
and discuss the query complexity of implementing them 
by the methods of the previous section. 

First, consider the unitary with matrix elements Ujk = 
g(j + k mod AT), where g is a black-box function for a 
search problem with a unique marked item j* . The func- 
tion g: {1,2,...,^} -> {0,1} satisfies g(j*) = 1 and 
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g(j) — for j 7^ j*. Since U\0) — \j*), implement- 
ing U solves the search problem; thus it requires Q(y/N) 
queries [2(j- This unitary has ||f7||ma« = 1 and \\U\\i = 1, 
so Corollary [T2l gives a complexity of 0(y/N/S), which 
is optimal. In fact, simply implementing the isometry 
T solves the search problem, because it can prepare the 
state T|0) = |0)|j*). The implementation of T in this 
case is in fact equivalent to the standard Grover search 
algorithm [25[. 

In this case, Oj is known, so the implementation can be 
performed exactly. Using Lemma 1 1 01 unitaries may be 
implemented exactly using a quantum walk, so the only 
remaining source of error is in performing the steps of the 
quantum walk. From the proof of Lemma [6j the steps of 
the quantum walk may be performed exactly if r° pt from 
Eq. (1531 is a known integer. Because we have a 3 ■ = e = 
Ai = 1, we can easily adjust X to ensure that this is 
the case, and therefore that the simulation is performed 
exactly. More generally, whenever U is a permutation 
matrix, it can be implemented in only 0(y/~N) queries in 
a similar fashion. 

Another simple example is the quantum Fourier trans- 
form, the unitary with U jk = e 27Tl]k / N /y/N. For this 
unitary, ||[/|| max = 1/y/N and ||£/||i = y/W. Therefore, 
Corollary again gives a complexity of 0(y/N/$). In 
fact, for this case we can take e = 1, (jj = Ai = y/N and 
X = 1/y/N, so Eq. (|53| gives r° pt = 0. Therefore no am- 
plitude amplification is required, and the implementation 
is again exact. 

These two examples illustrate the two extremal cases 
where Corollary [T2] gives scaling of y/N. First, if all 
the weight is on one matrix element in each row, then 
\\U\\i = 1. At the other extreme, if the weight is 
evenly distributed between the matrix elements, then 
= y/N, but ||J7||max = 1/VN. These correspond to 
the two points listed at the end of Sec. lVIl In either case, 
the nonzero matrix elements have the same magnitude. 

Note that to take advantage of sparsity, the locations of 
the nonzero elements must be known (or more precisely, 
their locations must be accessible via the oracle Of)- Ef- 
fectively, the quantity D measures how many matrix ele- 
ments are not known to be zero. For the search problem, 
the locations of the nonzero elements are not known in 
advance (finding those positions would in itself solve the 
search problem), so D = N . In contrast, for the norms, 
it does not matter if the nonzero elements are in known 
positions. If there are m nonzero matrix elements, then 
ll^lli < v 7 ™ regardless of the positions of those elements. 

For Corollary [15] to yield scaling worse than y/N, 
the distribution of magnitudes of matrix elements of U 
must have a sharp peak in combination with a relatively 
broad distribution for the remaining elements. As a 
natural example of this, consider the unitary given by 
U = eyL~p(—iirJ x /2), where J x is the ^-rotation operator 
for a spin-J system, with dimension N = 2 J + 1, and 
we use the basis of J z eigenstates. The first column of 
exp(— i-n J x /2) has a relatively narrow peak, whereas for 



0.25 




; 

FIG. 3. The matrix elements of U = exp(— in J x /2) in the 
basis of J z eigenstates for J = 100. The separate points are 
|0'|£/|0)|, and the solid curve is \(j\U\ J}\. 

columns towards the middle the elements are more spread 
out (see Fig. |3]). The maximum element of U has abso- 
lute value VT2|Tf)!/2 rJ1 \J] !, which is 0(J -1 / 4 ) by Stir- 
ling's formula. Since || U\\ i = 0(y/l), Corollary QJ yields 
an overall number of black-box queries of 0(N 5 ^ 8 /S 1 ^ 2 ). 

Thus Corollary [T2l does not provide y/N scaling in this 
case, though the scaling is better than the worst-case 
N 3 / 4 scaling. Using Corollary[3]would not yield improved 
scaling in this example, because 2/3 > 5/8. However, nu- 
merical testing indicates that breaking this unitary into 
components does not increase the spectral norms, so us- 
ing the approach given in Sec. IVII AI would yield y/N 
scaling. In this example, calculating the matrix elements 
of U is nontrivial, and consequently the overall complex- 
ity of the algorithm in terms of elementary gates would 
be greater than y/N. 

We emphasise that the motivation to implement uni- 
taries is not as a shortcut to simulation of Hamiltonians 
via U — er lHt . In general, calculating the matrix ele- 
ments of U = e~ zHt given the matrix elements of H may 
be difficult. Rather, the motivation for implementing 
unitaries is to provide a tool to develop other algorithms. 
As discussed above, the search problem may be encoded 
as a unitary operation. The algorithm for implementing 
unitaries may be regarded as a new generalisation of the 
Grover algorithm. 



X. CONCLUSION 

We have shown how to use quantum walks to simu- 
late black-box Hamiltonians. In particular, we showed 
that these techniques can be used to implement an arbi- 
trary N x N unitary transformation using 0(N 2 / 3 /5 1 / 3 ) 
queries to a black box for its matrix elements, with error 
at most S as quantified by the trace distance. 

Our approach is based on simulating Hamiltonian dy- 
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namics via discrete-time quantum walk [lOj], combined 
with state preparation via amplitude amplification (l2j 
and integrators to break up the Hamiltonian. In many 
cases the implementation can be performed even faster, 
with 0(yjN/8) black-box calls. This scaling can be 
achieved except when breaking the Hamiltonian into 
a sum of terms yields components with large spectral 
norms, and numerical testing suggests that such cases 
are rare. 

For many applications, our work provides the best 
known simulation of sparse Hamiltonians. The number 
of queries is strictly linear in ||.ff||t, rather than slightly 
superlinear, as when higher-order integrators are used 
[1, IH . In addition, the scaling in the sparseness param- 
eter D is at worst linear, in contrast with the 0(Z? 4 ) 
scaling of Ref. [9j . 

The best lower bound we know for black-box unitary 
implementation is O(yiV) queries, because implementing 



an N x N unitary suffices to solve unstructured search 
with N items. It remains an open problem to determine 
whether it is possible to perform the simulation using 
0(\/ r N) queries in general. 

Our results also apply to more general Hamiltonian 
simulation problems. It might be interesting to investi- 
gate the extent to which the general black-box Hamilto- 
nian simulation described by Theorem[5]can be improved. 
Simulations using 0(||JJt||) queries are not possible in 
general [3(| , but the tradeoff between quantities such as 
D, \\Ht\\, \\Ht\\i, \\Ht\\ max , and S is poorly understood. 
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